Targeting Specific Distributions of Trajectories in MDPs
نویسندگان
چکیده
We define TTD-MDPs, a novel class of Markov decision processes where the traditional goal of an agent is changed from finding an optimal trajectory through a state space to realizing a specified distribution of trajectories through the space. After motivating this formulation, we show how to convert a traditional MDP into a TTD-MDP. We derive an algorithm for finding non-deterministic policies by constructing a trajectory tree that allows us to compute locally-consistent policies. We specify the necessary conditions for solving the problem exactly and present a heuristic algorithm for constructing policies when an exact answer is impossible or impractical. We present empirical results for our algorithm in two domains: a synthetic grid world and stories in an interactive drama or game.
منابع مشابه
Authorial Idioms for Target Distributions in TTD-MDPs
In designing Markov Decision Processes (MDP), one must define the world, its dynamics, a set of actions, and a reward function. MDPs are often applied in situations where there is a clear choice of reward functions and in these cases significant care must be taken to construct a reward function that induces the desired behavior. In this paper, we consider an analogous design problem: crafting a...
متن کاملHindsight Optimization for Hybrid State and Action MDPs
Hybrid (mixed discrete and continuous) state and action Markov Decision Processes (HSA-MDPs) provide an expressive formalism for modeling stochastic and concurrent sequential decision-making problems. Existing solvers for HSA-MDPs are either limited to very restricted transition distributions, require knowledge of domain-specific basis functions to achieve good approximations, or do not scale. ...
متن کاملAnother look at search-based drama management
A drama manager (DM) monitors an interactive experience, such as a computer game, and intervenes to shape the global experience so it satisfies the author’s expressive goals without decreasing a player’s interactive agency. In declarative optimization-based drama management (DODM), the author declaratively specifies desired properties of the experience; the DM optimizes its interventions to max...
متن کاملBiomechanical Investigation of Empirical Optimal Trajectories Introduced for Snatch Weightlifting
The optimal barbell trajectory for snatch weightlifting has been achieved empirically by several researchers. They have studied the differences between the elite weightlifters’ movement patterns and suggested three optimal barbell trajectories (type A, B, and C). But they didn’t agree for introducing the best trajectory. One of the reasons is this idea that the selected criterion by researchers...
متن کاملIdentification of bovine, ovine and caprine pure and binary mixtures of raw and heat processed meats using species specific size markers targeting mitochondrial genome
A specific polymerase chain reaction (PCR) method was applied for identification of bovine (Bos taurus), ovine (Ovis aries) and caprine (Capra hircus) pure and binary mixtures of raw and heat-processed meats. These meats are used in food industry products and/or for direct consumption of consumers. The mitochondrial DNA was amplified as a template in a PCR reaction by use of specific primers re...
متن کامل